Reading and Writing files using Pandas

Pandas is one of the most popular Python libraries which provides a user-friendly interface to reading, presenting and writing files. It also has some additional features, such as plotting, time series analysis, missing value handling etc.



In [1]:

    
import pandas as pd

Part 1: reading a .csv file



In [2]:

    
data_csv = pd.read_csv("titanic.csv")



In [4]:

    
data_csv.head()









    Out[4]:






  
    
      
      PassengerId
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
    
  
  
    
      0
      1
      0
      3
      Braund, Mr. Owen Harris
      male
      22.0
      1
      0
      A/5 21171
      7.2500
      NaN
      S
    
    
      1
      2
      1
      1
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      female
      38.0
      1
      0
      PC 17599
      71.2833
      C85
      C
    
    
      2
      3
      1
      3
      Heikkinen, Miss. Laina
      female
      26.0
      0
      0
      STON/O2. 3101282
      7.9250
      NaN
      S
    
    
      3
      4
      1
      1
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35.0
      1
      0
      113803
      53.1000
      C123
      S
    
    
      4
      5
      0
      3
      Allen, Mr. William Henry
      male
      35.0
      0
      0
      373450
      8.0500
      NaN
      S

Part 2: reading .txt files

CSV stands for Comma Separated Values, as the values/variables in .csv files are separated by commas. Similarly, variables/values in .txt filesa are separated by tabs (" "). It is also often called as tab-separated file. To read .txt files in pandas we again use the same read_csv() function, yet this time we pass another argument besides name of the file: the separator (which should be a tab/whitespace for .txt file).



In [6]:

    
data_txt = pd.read_csv("imagine_lyrics.txt", sep=" ")



In [7]:

    
data_txt.head()









    Out[7]:






  
    
      
      Imagine
      by
      John
      LennonImagine
      all
      the
      people,
      Unnamed: 7
    
  
  
    
      0
      living
      life
      in
      peace...
      NaN
      NaN
      NaN
      NaN
    
    
      1
      \tJohn
      Lennon
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN

Part 3: reading .html files

Pandas also has a read_html() functino similar to read_csv(), which reads the html files. All of those functions can read the files directly from the web/url. Let's use the URL of careercenter to read the page content provided in HTML.



In [9]:

    
data_html = pd.read_html("https://careercenter.am/")









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-495f8adc6296> in <module>()
----> 1 data_html = pd.read_html("https://careercenter.am/")

C:\Program Files\Anaconda2\lib\site-packages\pandas\io\html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na)
    894                   thousands=thousands, attrs=attrs, encoding=encoding,
    895                   decimal=decimal, converters=converters, na_values=na_values,
--> 896                   keep_default_na=keep_default_na)

C:\Program Files\Anaconda2\lib\site-packages\pandas\io\html.pyc in _parse(flavor, io, match, attrs, encoding, **kwargs)
    731             break
    732     else:
--> 733         raise_with_traceback(retained)
    734 
    735     ret = []

C:\Program Files\Anaconda2\lib\site-packages\pandas\io\html.pyc in _parse(flavor, io, match, attrs, encoding, **kwargs)
    725 
    726         try:
--> 727             tables = p.parse_tables()
    728         except Exception as caught:
    729             retained = caught

C:\Program Files\Anaconda2\lib\site-packages\pandas\io\html.pyc in parse_tables(self)
    194 
    195     def parse_tables(self):
--> 196         tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
    197         return (self._build_table(table) for table in tables)
    198 

C:\Program Files\Anaconda2\lib\site-packages\pandas\io\html.pyc in _parse_tables(self, doc, match, attrs)
    424 
    425         if not tables:
--> 426             raise ValueError('No tables found')
    427 
    428         result = []

ValueError: No tables found

As you can see we receive an error here. The problem is that the read_html() function reads only HTML tables from the website, while no table could be found on careercenter webpage. If you check the source of their website you will see that there is no content. The content is generated trough another file called ccidxann.php. This means we should copy the link to that file and scrape it instead.



In [11]:

    
data_html = pd.read_html("https://careercenter.am/ccidxann.php")



In [12]:

    
data_html.head()









    



---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-035a0e7d72b1> in <module>()
----> 1 data_html.head()

AttributeError: 'list' object has no attribute 'head'

Now, the head() function can no longer be used, as our data is saved as a list, rather than a dataframe. So let's just print it.



In [13]:

    
print data_html









    



[                     0                                                  1
0    JOB OPPORTUNITIES                                                NaN
1                  NaN                     Chief Accountant / Noyan Tapan
2                  NaN  Leading Loan Specialist of Microcredit Block i...
3                  NaN                Senior Internal Auditor / FINCA UCO
4                  NaN                     Credit Officer / Prometey Bank
5                  NaN  Director / Civic Development and Partnership F...
6                  NaN                  Finance Director / Reso Insurance
7                  NaN  FTTB, ADSL/ VDSL Networks Monitoring Technical...
8                  NaN               Digital Platforms Manager / ArmenTel
9                  NaN                           Consultant/ Seller / TST
10                 NaN      Operations Research Developer / Optym Armenia
11                 NaN  Product Manager / Berlin-Chemie Armenian Repre...
12                 NaN               Policy Analyst / UNDP Armenia Office
13                 NaN                           Front-End Developer / 4H
14                 NaN  Specialist of Reconciliation Division / ArmSwi...
15                 NaN  Specialist of Loans Processing and Reporting D...
16                 NaN                      Accountant / Zeppelin Armenia
17                 NaN               Head of Digital Banking / Ameriabank
18                 NaN                                Data Analyst / IPSC
19                 NaN  Account Manager, Client Service Department / M...
20                 NaN     Digital Marketing Specialist / McCann Erickson
21                 NaN  Medical Representative/ Medical Equipment Spec...
22                 NaN  Head of Finance Management/ Chief Accountant /...
23                 NaN             Mobile UI/ UX Designer / Prometey Bank
24                 NaN                        Receptionist / Envoy Hostel
25                 NaN  Consultant on Cost Benefit Analysis of Alterna...
26                 NaN              Digital Innovations Specialist / Ucom
27                 NaN                    Graphic Designer / Baldi Retail
28                 NaN  Head of Operational Risk Assessment and Monito...
29                 NaN  Head of Operational Risk Management Department...
..                 ...                                                ...
119                NaN             Senior JavaScript Developer / Digitain
120                NaN                         JavaScript Developer / SFL
121                NaN                    Procurement Manager / Telia-Med
122                NaN         User Behavior Research Scientist / PicsArt
123                NaN         Deep Learning Research Scientist / PicsArt
124                NaN               Senior Software Developer / XNTrends
125                NaN                       UI/ UX Designer / IUNetworks
126                NaN             Engineering Director / Ginosi Apartels
127                NaN          Senior Systems Engineer / Ginosi Apartels
128                NaN         Senior Android Developer / Ginosi Apartels
129                NaN               Senior Java Developer / EPAM Systems
130                NaN              Digital Marketing Specialist / Lesona
131                NaN             Rental Agent for "Sixt" Armenia / Fora
132                NaN                  IT Project Coordinator / Altacode
133                NaN  Head of Technical Production Department / Doro...
134                NaN  Application Engineer, Place and Route Departme...
135                NaN  Software Engineer / Mentor Graphics Developmen...
136                NaN                Director of Engineering / Workfront
137                NaN                 CNC Machine Operator / Carrara Rus
138                NaN                          Storekeeper / Carrara Rus
139                NaN                           Accountant / Carrara Rus
140                NaN                        Technician/ Installer / TST
141                NaN                           Accounting Manager / TST
142                NaN                       IT Specialist / ArmSwissBank
143                NaN             Road Construction Engineer / Dorozhnik
144                NaN  Digital Marketing Specialist / Andava Digital ...
145                NaN       Stand Customer Service Specialist / Varks.am
146                NaN               Doctor Expert / Rosgosstrakh-Armenia
147                NaN                       WordPress Developer / Reload
148                NaN                   Mechanical Engineer / Imex Group

[149 rows x 2 columns],              0                                          1
0  INTERNSHIPS                                        NaN
1          NaN          Branch Intern / HSBC Bank Armenia
2          NaN  Contact Center Intern / HSBC Bank Armenia,            0                                         1
0  TRAININGS                                       NaN
1        NaN  English Language Courses / Career Center,               0                                                  1
0  COMPETITIONS                                                NaN
1           NaN  Invitation to Bid - ITB/ARM/01/2017 - Sale of ...
2           NaN  Call for Designing Companies for SMEDA Project...]

We may check the length of the list to understand how many elements it has. Basically, each element will be one separate table.



In [14]:

    
len(data_html)









    Out[14]:





4



In [15]:

    
data_html[0]









    Out[15]:






  
    
      
      0
      1
    
  
  
    
      0
      JOB OPPORTUNITIES
      NaN
    
    
      1
      NaN
      Chief Accountant / Noyan Tapan
    
    
      2
      NaN
      Leading Loan Specialist of Microcredit Block i...
    
    
      3
      NaN
      Senior Internal Auditor / FINCA UCO
    
    
      4
      NaN
      Credit Officer / Prometey Bank
    
    
      5
      NaN
      Director / Civic Development and Partnership F...
    
    
      6
      NaN
      Finance Director / Reso Insurance
    
    
      7
      NaN
      FTTB, ADSL/ VDSL Networks Monitoring Technical...
    
    
      8
      NaN
      Digital Platforms Manager / ArmenTel
    
    
      9
      NaN
      Consultant/ Seller / TST
    
    
      10
      NaN
      Operations Research Developer / Optym Armenia
    
    
      11
      NaN
      Product Manager / Berlin-Chemie Armenian Repre...
    
    
      12
      NaN
      Policy Analyst / UNDP Armenia Office
    
    
      13
      NaN
      Front-End Developer / 4H
    
    
      14
      NaN
      Specialist of Reconciliation Division / ArmSwi...
    
    
      15
      NaN
      Specialist of Loans Processing and Reporting D...
    
    
      16
      NaN
      Accountant / Zeppelin Armenia
    
    
      17
      NaN
      Head of Digital Banking / Ameriabank
    
    
      18
      NaN
      Data Analyst / IPSC
    
    
      19
      NaN
      Account Manager, Client Service Department / M...
    
    
      20
      NaN
      Digital Marketing Specialist / McCann Erickson
    
    
      21
      NaN
      Medical Representative/ Medical Equipment Spec...
    
    
      22
      NaN
      Head of Finance Management/ Chief Accountant /...
    
    
      23
      NaN
      Mobile UI/ UX Designer / Prometey Bank
    
    
      24
      NaN
      Receptionist / Envoy Hostel
    
    
      25
      NaN
      Consultant on Cost Benefit Analysis of Alterna...
    
    
      26
      NaN
      Digital Innovations Specialist / Ucom
    
    
      27
      NaN
      Graphic Designer / Baldi Retail
    
    
      28
      NaN
      Head of Operational Risk Assessment and Monito...
    
    
      29
      NaN
      Head of Operational Risk Management Department...
    
    
      ...
      ...
      ...
    
    
      119
      NaN
      Senior JavaScript Developer / Digitain
    
    
      120
      NaN
      JavaScript Developer / SFL
    
    
      121
      NaN
      Procurement Manager / Telia-Med
    
    
      122
      NaN
      User Behavior Research Scientist / PicsArt
    
    
      123
      NaN
      Deep Learning Research Scientist / PicsArt
    
    
      124
      NaN
      Senior Software Developer / XNTrends
    
    
      125
      NaN
      UI/ UX Designer / IUNetworks
    
    
      126
      NaN
      Engineering Director / Ginosi Apartels
    
    
      127
      NaN
      Senior Systems Engineer / Ginosi Apartels
    
    
      128
      NaN
      Senior Android Developer / Ginosi Apartels
    
    
      129
      NaN
      Senior Java Developer / EPAM Systems
    
    
      130
      NaN
      Digital Marketing Specialist / Lesona
    
    
      131
      NaN
      Rental Agent for "Sixt" Armenia / Fora
    
    
      132
      NaN
      IT Project Coordinator / Altacode
    
    
      133
      NaN
      Head of Technical Production Department / Doro...
    
    
      134
      NaN
      Application Engineer, Place and Route Departme...
    
    
      135
      NaN
      Software Engineer / Mentor Graphics Developmen...
    
    
      136
      NaN
      Director of Engineering / Workfront
    
    
      137
      NaN
      CNC Machine Operator / Carrara Rus
    
    
      138
      NaN
      Storekeeper / Carrara Rus
    
    
      139
      NaN
      Accountant / Carrara Rus
    
    
      140
      NaN
      Technician/ Installer / TST
    
    
      141
      NaN
      Accounting Manager / TST
    
    
      142
      NaN
      IT Specialist / ArmSwissBank
    
    
      143
      NaN
      Road Construction Engineer / Dorozhnik
    
    
      144
      NaN
      Digital Marketing Specialist / Andava Digital ...
    
    
      145
      NaN
      Stand Customer Service Specialist / Varks.am
    
    
      146
      NaN
      Doctor Expert / Rosgosstrakh-Armenia
    
    
      147
      NaN
      WordPress Developer / Reload
    
    
      148
      NaN
      Mechanical Engineer / Imex Group
    
  

149 rows × 2 columns



In [16]:

    
data_html[1]









    Out[16]:






  
    
      
      0
      1
    
  
  
    
      0
      INTERNSHIPS
      NaN
    
    
      1
      NaN
      Branch Intern / HSBC Bank Armenia
    
    
      2
      NaN
      Contact Center Intern / HSBC Bank Armenia



In [17]:

    
data_html[2]









    Out[17]:






  
    
      
      0
      1
    
  
  
    
      0
      TRAININGS
      NaN
    
    
      1
      NaN
      English Language Courses / Career Center



In [18]:

    
data_html[3]









    Out[18]:






  
    
      
      0
      1
    
  
  
    
      0
      COMPETITIONS
      NaN
    
    
      1
      NaN
      Invitation to Bid - ITB/ARM/01/2017 - Sale of ...
    
    
      2
      NaN
      Call for Designing Companies for SMEDA Project...

Let's take only the job postings table which had 2 columns as all the others. The first column has only NaN values, so we will chose only the second one and save it as our data for analysis.



In [19]:

    
data = data_html[0][1]

Now we have a dataframe, which can already be used together with the head() and other functions.



In [20]:

    
data.head()









    Out[20]:





0                                                  NaN
1                       Chief Accountant / Noyan Tapan
2    Leading Loan Specialist of Microcredit Block i...
3                  Senior Internal Auditor / FINCA UCO
4                       Credit Officer / Prometey Bank
Name: 1, dtype: object

Part 4: reading other files

Pandas has also functinos for reading Excel, Stata, SAS, JSON, SQL and other files. You may check the official documentation for details.

Part 5: writing to files

Writing in Pandas is as easy as reading. You just need to use another function called to_csv (in case of CSV files) for writing reason. Let's take a took at it.



In [21]:

    
data.to_csv("careercenter_data.csv")

We may now go to our folder to check the csv file.

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

	Imagine	by	John	LennonImagine	all	the	people,	Unnamed: 7
0	living	life	in	peace...	NaN	NaN	NaN	NaN
1	\tJohn	Lennon	NaN	NaN	NaN	NaN	NaN	NaN

	0	1
0	JOB OPPORTUNITIES	NaN
1	NaN	Chief Accountant / Noyan Tapan
2	NaN	Leading Loan Specialist of Microcredit Block i...
3	NaN	Senior Internal Auditor / FINCA UCO
4	NaN	Credit Officer / Prometey Bank
5	NaN	Director / Civic Development and Partnership F...
6	NaN	Finance Director / Reso Insurance
7	NaN	FTTB, ADSL/ VDSL Networks Monitoring Technical...
8	NaN	Digital Platforms Manager / ArmenTel
9	NaN	Consultant/ Seller / TST
10	NaN	Operations Research Developer / Optym Armenia
11	NaN	Product Manager / Berlin-Chemie Armenian Repre...
12	NaN	Policy Analyst / UNDP Armenia Office
13	NaN	Front-End Developer / 4H
14	NaN	Specialist of Reconciliation Division / ArmSwi...
15	NaN	Specialist of Loans Processing and Reporting D...
16	NaN	Accountant / Zeppelin Armenia
17	NaN	Head of Digital Banking / Ameriabank
18	NaN	Data Analyst / IPSC
19	NaN	Account Manager, Client Service Department / M...
20	NaN	Digital Marketing Specialist / McCann Erickson
21	NaN	Medical Representative/ Medical Equipment Spec...
22	NaN	Head of Finance Management/ Chief Accountant /...
23	NaN	Mobile UI/ UX Designer / Prometey Bank
24	NaN	Receptionist / Envoy Hostel
25	NaN	Consultant on Cost Benefit Analysis of Alterna...
26	NaN	Digital Innovations Specialist / Ucom
27	NaN	Graphic Designer / Baldi Retail
28	NaN	Head of Operational Risk Assessment and Monito...
29	NaN	Head of Operational Risk Management Department...
...	...	...
119	NaN	Senior JavaScript Developer / Digitain
120	NaN	JavaScript Developer / SFL
121	NaN	Procurement Manager / Telia-Med
122	NaN	User Behavior Research Scientist / PicsArt
123	NaN	Deep Learning Research Scientist / PicsArt
124	NaN	Senior Software Developer / XNTrends
125	NaN	UI/ UX Designer / IUNetworks
126	NaN	Engineering Director / Ginosi Apartels
127	NaN	Senior Systems Engineer / Ginosi Apartels
128	NaN	Senior Android Developer / Ginosi Apartels
129	NaN	Senior Java Developer / EPAM Systems
130	NaN	Digital Marketing Specialist / Lesona
131	NaN	Rental Agent for "Sixt" Armenia / Fora
132	NaN	IT Project Coordinator / Altacode
133	NaN	Head of Technical Production Department / Doro...
134	NaN	Application Engineer, Place and Route Departme...
135	NaN	Software Engineer / Mentor Graphics Developmen...
136	NaN	Director of Engineering / Workfront
137	NaN	CNC Machine Operator / Carrara Rus
138	NaN	Storekeeper / Carrara Rus
139	NaN	Accountant / Carrara Rus
140	NaN	Technician/ Installer / TST
141	NaN	Accounting Manager / TST
142	NaN	IT Specialist / ArmSwissBank
143	NaN	Road Construction Engineer / Dorozhnik
144	NaN	Digital Marketing Specialist / Andava Digital ...
145	NaN	Stand Customer Service Specialist / Varks.am
146	NaN	Doctor Expert / Rosgosstrakh-Armenia
147	NaN	WordPress Developer / Reload
148	NaN	Mechanical Engineer / Imex Group

	0	1
0	INTERNSHIPS	NaN
1	NaN	Branch Intern / HSBC Bank Armenia
2	NaN	Contact Center Intern / HSBC Bank Armenia

	0	1
0	COMPETITIONS	NaN
1	NaN	Invitation to Bid - ITB/ARM/01/2017 - Sale of ...
2	NaN	Call for Designing Companies for SMEDA Project...